# Reinforcement Learning Fine-tuning

Finetuned Tamil Llama 7B Finetuned
A supervised fine-tuning (SFT) model based on the Transformers library, designed to optimize language model performance
Large Language Model Transformers
F
Jaggu05
73
1
Qwen3 0.6B TLDR Lora
Apache-2.0
Qwen3-0.6B is an open-source language model based on the Transformer architecture, with a parameter scale of 600 million, suitable for natural language processing tasks such as text summarization.
Text Generation
Q
phh
56
0
Qwen 2.5 7B Base RAG RL
Qwen-2.5-7B-base-RAG-RL is a large language model with 7B parameters trained from scratch on an unknown dataset, incorporating Retrieval-Augmented Generation (RAG) and Reinforcement Learning (RL) technologies.
Large Language Model Transformers
Q
XXsongLALA
859
7
Phi 4 Reasoning Plus
MIT
Phi-4-reasoning-plus is an advanced open-weight reasoning model developed by Microsoft Research, optimized through supervised fine-tuning and reinforcement learning based on Phi-4, focusing on advanced reasoning capabilities in mathematics, science, and coding fields.
Large Language Model Transformers Supports Multiple Languages
P
microsoft
19.83k
261
Deepcoder 1.5B Preview AWQ
MIT
DeepCoder-1.5B-Preview is a large language model for code reasoning, fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B through distributed reinforcement learning, capable of handling longer context lengths.
Large Language Model Transformers English
D
adriabama06
72
2
Deephermes ToolCalling Specialist Atropos
An experimental model fine-tuned by Nous Research using the Atropos reinforcement learning framework, focused on improving the tool calling performance of the Llama-3.1 8B model in inference mode
Large Language Model Transformers English
D
NousResearch
64
4
Qwen2.5 0.5B Instruct Gensyn Swarm Fierce Placid Whale
A fine-tuned version based on Gensyn/Qwen2.5-0.5B-Instruct, trained using the TRL framework and GRPO algorithm
Large Language Model Transformers
Q
gangchen
3,053
2
Notbad V1 0 Mistral 24b
Apache-2.0
Notbad v1.0 Mistral 24B is a model focused on mathematical and Python programming reasoning, based on Mistral-Small-24B-Instruct-2501 and further trained with reinforcement learning.
Large Language Model Transformers
N
notbadai
29
5
EXAONE 3.5 2.4B Fine Tuning
Hugging Face's Transformer model library supporting various natural language processing tasks
Large Language Model Transformers
E
good593
65
2
Qwen2.5 0.5B Instruct
Apache-2.0
A 0.5B parameter instruction fine-tuned model designed for the Gensyn reinforcement learning group, supporting local fine-tuning training
Large Language Model Transformers English
Q
Gensyn
2.4M
5
Alignprop Trl Aesthetics
Apache-2.0
A text-to-image generation model fine-tuned based on Stable Diffusion v1.5, using aesthetic reward functions on animal datasets and trained with reward backpropagation methods.
Image Generation
A
mihirpd
15
1
Vlrm Blip2 Opt 2.7b
MIT
A BLIP-2 OPT-2.7B model fine-tuned with reinforcement learning, capable of generating long and comprehensive image descriptions
Image-to-Text Transformers English
V
sashakunitsyn
398
17
Codellama 7b Hf ReFT GSM8k
Enhances the reasoning generalization capabilities of large language models through reinforcement fine-tuning, based on Codellama fine-tuning, suitable for code generation and comprehension tasks.
Large Language Model Transformers
C
lqtrung1998
38
1
Blip Image Captioning Large Mocha
MIT
This is the official fine-tuned version of the BLIP-Large model, optimized using the MOCHa reinforcement learning framework on the MS-COCO dataset to mitigate open-vocabulary description hallucination issues
Image-to-Text Transformers
B
moranyanuka
188
10
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase